Improved estimation of cluster mass profiles from the cosmic microwave background 
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We develop a new method for reconstructing cluster mass profiles and large-scale structure from the cosmic 
microwave background (CMB). By analyzing the likelihood of CMB lensing, we analytically prove that stan- 
dard quadratic estimators for CMB lensing are unbiased and achieve the optimal condition only in the limit of 
no lensing; they become progressively biased and sub-optimal, when the lensing effect is large, especially for 
clusters that can be found by ongoing Sunyaev-Zel'dovich surveys. Adopting an alternative approach to the 
CMB likelihood, we construct a new maximum likelihood estimator that utilizes delensed CMB temperature 
fields based on an assumed model. We analytically show that this estimator asymptotically approaches the opti- 
mal condition as our assumed model is refined, and we numerically show that as we iteratively apply it to CMB 
maps our estimator quickly converges to the true model with a factor of ten less number of clusters than standard 
quadratic estimators need. For realistic CMB experiments, we demonstrate the applicability of the maximum 
likelihood estimator with tests against numerical simulations in the presence of CMB secondary contaminants. 
With significant improvement on the signal-to-noise ratio, our new maximum likelihood estimator can be used to 
measure the cluster-mass cross-correlation functions at different redshifts, probing the evolution of dark energy. 

PACS numbers: 98.62.Sb, 98.70.Vc, 98.80.Es 



I. INTRODUCTION 

As the most distant observable sources, the cosmic mi- 
crowave background (CMB) anisotropics provide a unique 
channel to probe the universe after the cosmological recombi- 
nation epoch. In particular, weak gravitational lensing of the 
CMB can be used to map the matter distribution in the uni- 
verse at higher redshift than weak lensing of faint background 
galaxies can ever achieve. Recent work lUsSHl has focused 
on measuring the lensing signature in the CMB by large-scale 
structure between the last scattering surface and the present 
universe, but relatively little attention has been paid to weak 
lensing of the CMB by clusters of galaxies. 

The abundance of massive clusters is exponentially sensi- 
tive to the growth of the underlying matter distribution, and 
hence it has been recognized as a powerful probe of the evo- 
lution of dark energy (e.g., fst]). However, the constraining 
power as a cosmological probe can be only realized, if the 
cluster masses are accurately measured. To achieve this goal, 
many cluster surveys are designed to detect massive clusters 
and measure their mass using the Sunyaev-Zel'dovich (SZ) ef- 
fect, and some of the planned surveys are already operational 
using the South Pole Telescope (SPT), and the Atacama Cos- 
mology Telescope (ACT). Weak lensing of the CMB can be 
applied to the same clusters found in the SZ surveys without 
additional observations, providing independent measurements 
of their mass. Furthermore, the CMB provides the highest 
redshift source plane with precision measurements of its dis- 
tance, which can be combined with galaxy weak lensing mea- 
surements of the same lensing clusters to obtain angular diam- 
eter distance ratio estimates that are independent of the mass 
distribution, substantially increasing the leverage to constrain 
cosmological parameters [6J. 
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Gravitational lensing by clusters imprints a unique signa- 
ture in the CMB anisotropics. On arcminute scales, the pri- 
mordial CMB anisotropics decay exponentially due to the 
photon diffusion from the baryon-photon fluid around the 
recombination epoch |7], and to a good approximation the 
CMB can be considered as a pure temperature gradient on 
small scales. Based on this approximation, Seljak and Zal- 
daiTiaga fs'] showed that clusters create dipole-like wiggles 
in the CMB temperature by remapping the otherwise smooth 
gradient field, and this unique feature can be used to isolate 
the lensing effect by clusters and to reconstruct the deflection 
angle, once the temperature gradient is separately measured 
on large scales. Vale, Amblard, and White |9] and Holder 
and Kosowsky 1 10] used A^-body simulations to model realis- 
tic lensing clusters, and they found that the mass reconstruc- 
tion for individual clusters is compromised, since it is hard to 
measure the large-scale temperature gradient accurately and 
secondary anisotropics in the CMB can partially mimic the 
lensing signature. 

However, it has been realized that one can apply the same 
technique developed for reconstructing large-scale structure 
to clusters of galaxies, measuring the statistical properties 
of a sample of clusters. Unlike galaxy weak lensing, CMB 
anisotropics have no characteristic shape, even statistically, 
from which the deviation is a measure of the lensing effect. 
Gravitational lensing, however, gives rise to a deviation of 
the two-point correlation function of the CMB temperature 
anisotropics from statistical isotropy. The standard technique 
is to construct a lensing estimator that is quadratic in observed 
temperature anisotropics, measuring the correlation between 
different Fourier modes, which is directly proportional to the 
lensing effect ifTTll . 

This method is easy to implement in analyzing real data 
compared to the full likelihood analysis (l^ and no separate 
measurement is required to obtain th^ large-scale temperature 
gradient. However, Maturi et al. jisil showed that standard 
quadratic estimators need a modification to be an unbiased es- 
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timator in a region around massive clusters. Hu, DeDeo, and 
Vale 1 14] quantitatively demonstrated that standard quadratic 
estimators based on the linear approximation ignore higher- 
order terms in the lensing effect that coherently contribute 
to the lensing reconstruction, and hence the reconstruction 
is biased low when the lensing effect is large. Furthermore, 
they proposed modified quadratic estimators that remove the 
higher-order terms in violation of the linear approximation 
by low-pass filtering observed temperature fields, and they 
showed that the modified quadratic estimators recover clus- 
ter mass profiles with no significant bias. However, the cutoff 
scale of the low-pass filter is somewhat arbitrary and it de- 
pends on the lensing effect, which we want to measure with 
the estimators. 

Here we develop a new maximum likelihood estimator for 
reconstructing cluster mass profiles and large-scale structure 
by analyzing the likelihood of CMB lensing. Our approach 
is similar in making full use of the likelihood information to 
one advocated by Hirata and Selj ak [ 1 2] . While they derive an 
analytic expression for a maximum likelihood estimator, it is 
impractical to apply to a realistic problem, because the solu- 
tion is too general and computationally expensive. However, 
our maximum likelihood estimator is different from theirs and 
it is easy to use in practice, because we adopt an alternative 
approach to setting up the likelihood: it takes a similar form 
of standard quadratic estimators and it approaches the optimal 
condition as it is iteratively applied to CMB maps. Further- 
more, we show that our maximum likelihood estimator can 
reconstruct cluster mass profiles with a factor of ten less num- 
ber of clusters than standard or modified quadratic estimators 
need. 

The rest of the paper is organized as follows. We first de- 
rive a quadratic estimator, accounting for the telescope beam 
effect in Sec. This consideration makes a difference com- 
pared to the usual practice in the literature, where quadratic 
estimators are often applied to beam deconvolved CMB maps. 
In Sec.|III]we analytically show that the quadratic estimators 
are unbiased and optimal only when the lensing effect van- 
ishes, and why the modified quadratic estimators outperform 
the standard quadratic estimators when the lensing effect is 
large. Based on this observation, we construct a delensed 
temperature field and derive a maximum likelihood estima- 
tor using the delensed temperature field. We demonstrate its 
applicability to realistic CMB experiments using numerical 
simulations in Sec.|IV] We discuss the impact of the telescope 
beam and instrumental noise in the delensing process and we 
conclude in Sec. [V] 

In this paper we will only consider lensing estimators based 
on CMB temperature anisotropics, since the planned surveys 
are not yet sensitive to CMB polarization anisotropics on ar- 
cminute scales. However, it is straightforward to extend our 
formalism to lensing estimators based on CMB polarization 
anisotropics. Throughout the paper we assume a flat ACDM 
universe with the matter density parameter ri„ift-^ — 0.127, 
the baryon density parameter ilbh^ — 0.0222, the Hubble 
constant h = 0.73, the spectral index Ug — 0.95, the opti- 
cal depth to the last scattering surface t = 0.09, and the pri- 
mordial curvature perturbation amplitude Ag = 2.5 x 10~^ 



(corresponding to the matter power spectrum normalization 
as = 0.75), consistent with the recent cosmological parame- 
ter estimation (e.g., ifTsifTelfTTll ) 



II. FORMALISM 

Here we describe our notations for weak lensing of the 
CMB and derive a quadratic estimator for CMB lensing re- 
construction. 



A. Weak Lensing of the CMB 

Gravitational lensing deflects light rays as they propagate 
through fluctuating gravitational fields, and the deflection vec- 
tor d(n) at the angular position n on the sky is related to 
the line-of-sight projection of the gravitational potential tp as 
d(n) = V0(n), where the projected potential is 



(ft) = -2 



dD 



D.-D 
DD, 



^iDh,D), 



(1) 



V is the derivative with respect to ft, and is the comoving 
angular diameter distance to the last scattering surface. Here 
we have assumed a flat universe and c = 1. The projected 
potential is further related to the convergence k as V^0(n) = 

-2K(ft). 

Since gravitational lensing conserves the surface brightness 
of diffuse backgrounds, the lensed temperature field r(ft) of 
the CMB is simply the intrinsic (unlensed) temperature field 
r(ft) remapped by the deflection vector. 



f{n) = T ft + V0(ft) 



(2) 



We will use notation with (or without) tilde to represent lensed 
(or unlensed) quantities. Note that we mainly work in the 
Ray leigh- Jeans tail and express the surface brightness in terms 
of temperature. 

In a sufficiently small patch of the sky, it significantly 
simplifies the manipulations to work in Fourier space [see 
[Tl,[l^|2^ for all-sky formalism]. In Fourier space the lensed 
temperature is 



T, 



-il-n 



(3) 



d-'h T(n) e"" 



where we Taylor expanded f\ to the first order in (pi. 
We kept the same notation for Fourier components, while the 
functional dependence is indicated as a subscript (e.g., T(ft) 
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and T\ are Fourier counterparts). The rms deflection angle 
(d • d) is a few arcminutes and the deflection power peaks 
at a few degree scale, comparable to the angular sizes of 
clusters. However, the large-scale deflection field is coherent 
over the scales of the temperature fluctuations, resulting in an 
unobservable overall shift of the temperature field MM, and 
the linear approximation remains valid. In Sec. |III]we discuss 
the limitation of this approximation when the lensing effect is 
large in a region around massive clusters. 

Since the intrinsic CMB is Gaussian and isotropic, the sta- 
tistical properties of the temperature field can be completely 
described by the power spectrum C; , 

{Ti,TQ = {2TT)'S{h~h)Q„ (4) 



In reality, one needs to consider other contributions to 
jiobs^ such as residual foregrounds, point radio sources, 
and CMB secondary anisotropics. We will only consider 
secondary contributions in Sec. IIV CI 



B. Quadratic Estimator 

Here we consider a convergence estimator k{h) that is 
quadratic in the observed temperature field, accounting for 
telescope beam and detector noise.' We require that the esti- 
mator be unbiased when averaged over an ensemble of CMB 
maps, {k{n)) = K{n). With these conditions, the estimator 
takes the general form in Fourier space 



where the asterisk represents complex conjugation and S is 
the Dirac delta function. Analogously, we define the pro- 
jected potential power spectrum Cf"^. Thus the deflection 
and the convergence power spectra are Cf'^ — Pcf"^ and 
= l^Cf'^'/A, respectively. Note that C/"^ can always 
be defined in this way, though it may be an incomplete de- 
scription of the statistical properties of the projected potential 
when 01 is non-Gaussian. Finally, the power spectrum of the 
lensed temperature field is 



G=[l-/^i?]Q+|^[(l-l')-lf 



i'C.r, (5) 



where R = (IMtOJ din / l^Cf''' is the half of the rms de- 
flection angle 111 [H . 

In practice, the observed temperature field has two addi- 
tional contributions: detector noise independent of the sig- 
nal, and telescope beam convolving the signals from different 
patches of the sky. We assume that the detector noise is white, 
so that the noise power spectrum is constant. 



Nl f dHi 



F(ll,l2)T,°'^^Tlf^ 



(9) 



where I2 = L — li and Nl is a normalization coefficient, 
which only depends on L = |L|. The functional form of 
F(li, I2) can be obtained by minimizing the variance of kl 
and imposing the normalization condition 



f(li,l2)= ^^'^^^i\+^,^^^'-^ e-i'i^'e-^'i^i, (10) 



and the normalization coefficient is 



1 1 f dHi [L-liG, + 



Finally, the variance of the estimator is 



(11) 



C; = Ay = CTpix^T^ : (O) 

*pix 

where o-pix is the rms error in each pixel of the detector in 
units of ^K, /sky is the fraction of the survey area on the 
sky, and iVpix is the total number of detector pixels ll23ll . 
Convolution is simply a multiplication in Fourier space, and 
the beam factor for a simple Gaussian beam we consider is 
Bi = exp [— ^^■^c^]- The beam width at is related to the 
full- width half-maximum (FWHM) as ai, — ^fwhm / VS\n2. 
The observed temperature field and its power spectrum are 
then 



{k^kl) = (2^)2 5{L - L')(Cr + (12) 

where N'^'^ = L'^Nl/4: is the noise power spectrum of kl. 
One can think of Cl'^/N^'^ as a signal-to-noise ratio, and 
the reconstruction becomes difficult at the angular scale L, 
where Cl'^ ~ N^'^. Given experimental specifications, the 
noise power spectrum N'^'^, as a function of the intrinsic CMB 
power spectrum Cl, becomes smallest, when there exists sub- 
stantial power in Cl at the scale of interest, with its shape 
deviating from the scale-invariance (L'^Cl =constant) Ii24i1 . 



^"bs ^ Cie 



N 



N 



(7) ' We will use quantities with hat to represent estimators of the quantities 

without hat, e.g., a convergence estimator is denoted as k and a true conver- 

(8) gence field is denoted as k. However, this notational convention should not 
be confused with that used for temperature fields: T, T, T°^^, and T rep- 
resent the intrinsic (unlensed), the lensed [Eq. j2)], the observed [Eq. 0], 
and the delensed [Eq. )27H temperature fields, respectively. 
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FIG. 1: Convolution filter H{6) as a function of separation 9 = |n| 
for CMB experiments with (jpix — bfiK and lO^K in Sec. lIVI The 
insets show details of H{0) at the center (left) and its tail (right). 



Our estimator recovers the general form of the standard 
quadratic estimators as ab 0, and Nl corresponds to 
the noise power spectrum of a deflection estimator cIl = 
2L ki,/L^ used in the literature ifTTIl . 

The estimator can be decomposed as two Wiener-filtered 
temperature functions in real space, which essentially corre- 
lates the gradient of the lensed temperature field with the un- 
lensed temperature field to isolate the lensing effect, 



G(n) = 
W{n) = 



ht; 



obs 



Ci 

^obs 



(2^) 



Ti 



2 ^1 



obs 



^obs 



e 2 



(13) 
(14) 



and the convergence estimator can be expressed in terms of 
G(n) and W{h) as 



Nl 



I' 



iL ■ / d^n G{n)Wih) e 



(15) 



-V • [G(n)W^(n)] - 



d^L 



(27r)2 N, 



e^^ " (16) 



J d^m H{ih — h)k{m.). 



The divergence of the two Wiener-Filtered functions is 
a convolution of the convergence estimate k{n) and the filter 



H{n) 



d^L -1 
(2^)2 TVl 



(17) 



Figure [T] plots the filter H{0) as a function of separation 9 = 
|n| for experiments with <Tpix 5/iK and lOfiK, to which 
we apply quadratic estimators in Sec. |IV] The filter peaks at 
the center and its width is ~ 3', roughly set by the scale that 
the intrinsic CMB and detector noise power spectra become 
comparable. While the filter is highly oscillating at its tail, it 
is negligible atd > 10' due to the large weight near the center. 
A factor of two change in Cpix has little impact on the width 
of the filter, because the crossing scale is already at the CMB 
damping tail. 



ni. MAXIMUM LIKELIHOOD ESTIMATOR 

In this section, we analyze the likelihood of CMB lensing 
by singular isothermal clusters. We first derive a quadratic 
estimator for singular isothermal clusters and compare the es- 
timator to the optimal estimator from the likelihood. With the 
simple singular isothermal model, our analysis will be car- 
ried out analytically, showing that (1) the standard quadratic 
estimators are unbiased and optimal in the limit of no lens- 
ing, (2) they progressively become biased and sub-optimal 
when the lensing effect increases, and (3) why the mod- 
ified quadratic estimators perform better than the standard 
quadratic estimators. Finally, we develop a unbiased maxi- 
mum likelihood estimator to reconstruct cluster mass profiles 
as well as large-scale structure. We demonstrate its applica- 
bility to CMB experiments with tests against numerical simu- 
lations using more realistic cluster models in Sec. lIVI 



A. Quadratic Estimator for a Singular Isothermal Cluster 



This approach of using the two Wiener-filtered functions is 
more convenient for computing kl by using Fast Fourier 
Transform (FFT) routines than by directly computing Eq. 
Furthermore, it is more physically intuitive than the general 
derivation, though the latter has clear advantage in its trans- 
parency and understanding the uniqueness of the functional 
form I2). A modified quadratic estimator can be con- 
structed by removing the signals in Eq. (fTsT i at I > /cut, while 
Eq. (fT4b remains unchanged. 

To better understand how quadratic estimators operate, we 
Fourier transform and rearrange Eq. ( fTsT i as 



A singular isothermal cluster has a density profile p{r) cx 
and its enclosed mass increases with r, which requires 
truncation at some radius to be a viable model for real clus- 
ters. However, this model has advantage in its simplicity: its 
properties are described by one parameter, Einstein radius 



Oe ^ 4^g' * , (18) 
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where a is one-dimensional velocity dispersion of a clus- 
ter and is the comoving angular diameter distance to 
the lensing cluster CMB lensing has a well-defined single 
plane of the source redshift and the comoving angular diam- 
eter distance to the last scattering surface Di, — 14.12 Gpc 
is now measured with less than 1% uncertainty iITtIi . The 
convergence is K(fi) — O-e/^O and the deflection vector is 
d(n) — —O-Ejii- given the angular separation 9 = |n| from 
the origin in a cluster centric coordinate. When a virial radius 
i?vir is defined as the radius inside which the mean density is 
200 times the cosmic mean matter density, a singular isother- 
mal cluster of mass M = lO^^/i~^M0 within the virial radius 
at — 1 has an Einstein radius 0e = S'.'O and a velocity dis- 
persion a = 2.0 X 10^'^(= 610 kms^^), and they scale as 
6'e oc and a cx M^/\ 

A quadratic estimator 9^ for singular isothermal clusters 
can be readily derived using the method described in Sec. Ill Bl 
but here we take an idealized approach for the purpose of com- 
parison, where we assume dpix = o"b = 0. Under the condi- 
tion that the estimator is unbiased {9'^) = 6'e and it has the 
minimum variance, the quadratic estimator is 



1 



(2^)27 (27r)2 



(19) 



(I1+I2) 



Ci^Ci^ 



II1 + I2 



with the normalization coefficient 



(27r)2 J (27r)2 



(20) 



I11 + 12P 



The variance of the estimator is ((^^^ — 6'e)(^e^ — (^e)) = 
1 jJ-. Here we Taylor expanded T\ and kept terms only to the 
first order in 9^ in deriving (f^ . 



B. Relation to the Optimal Estimator 



For convenience, we take a negative logarithm of P{T\9^^ 
and call it likelihood. 



L{f\9-^) = -lnPif\9^) (22) 
= if (n) C-\h,h'\9^^) f(h') + ^lndetC{9^), 



where the summation over h and n' is implicitly as- 
sumed and hereafter we will suppress the angular dependence 
for simplicity. In general, the likelihood is a functional with 
its argument of a scalar field, such as K(n) or 0(n). However, 
in our case it reduces to a function with its argument of a 
scalar 6*^ , substantially simplifying the manipulation. 
We take a derivative of C with respect to 0g\ 



dC 
89^ 



-if r-i r-i f 

;2 



(23) 



dPh f<Ph Ti.Ti, n{hCi, + hCi,) ■ (li + I2) 



(27r)2y(27r)2Q,Q, 



111 



where we computed the derivative to the first order in 
. Since gravitational lensing only redistributes the intrinsic 
CMB, the last term (log determinant) in Eq. (|22] | is inde- 
pendent of 9^^ and hence the derivative with respect to 9^ 
vanishes in Eq. ( |23] ). However, in the presence of non-white 
instrumental noise, and/or other secondary contaminants, 
the derivative acquires a nonzero value but it is in general 
negligible compared to the quadratic term in Eq. ( l23T l. We 
will neglect this effect in the remainder of this paper In 
the presence of significant contaminants from secondaries, 
the assumption that the likelihood function is Gaussian 
becomes invalid before the log determinant term becomes 
non-negligible. 

With the derivative of C, we can compute the Fisher infor- 
mation matrix 



09^^ 



dC dC 

89^89^ 



(24) 



The likelihood function P{T\9^£) simply represents the 
probability that a singular isothermal model with 9^ can have 
the lensed temperature field T(n). Since the intrinsic CMB 
follows a Gaussian distribution and gravitational lensing only 
remaps the intrinsic CMB, the distribution of T(n) is also 
Gaussian and its statistical properties are fully described by 
the covariance matrix of T'(n) 

C{n,h') = (f (n)f (n')) = J ^ e^H-fl'). (21) 



where for the second equality we used the normaUzation con- 
dition of the likelihood function 1 = J df P(f'|6lg') ^ 
J dT e~'~ . Within the Gaussian approximation, T can be 
evaluated at any value of Note that T is identical to the 
normalization coefficient in Eq. (|20] i. 

In statistical parameter estimation, there exists a power- 
ful theorem, known as the Cramer-Rao inequality that error 
bars in a parameter estimation have a definite lower bound 
o'(^'e') > set by the Fisher matrix. Moreover, this 

theorem provides a necessary and sufficient condition for an 
estimator to saturate the Cramer-Rao inequality, i.e., to be an 
optimal estimator ff^^ Ii25i1 . 
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(25) 



Now it is apparent that only in the limit of no lensing (the 
true Einstein radius Oe = 0^ = 0) does the quadratic estima- 
tor become an optimal estimator 9'^^ with the smallest 
variance attainable from the data. Conversely, becomes 
progressively biased and sub-optimal as the lensing effect in- 
creases. This can be also understood by the validity of the 
linear approximation: since the quadratic estimator is con- 
structed to be unbiased and to minimize the variance when 
T\ is expanded to the linear order in 01, it is natural to expect 
that this condition breaks down when higher-order terms in 
(j>\ become dominant over the linear order term. The modi- 
fied quadratic estimator, on the other hand, removes the an- 
gular modes of the signals at I > Icut by explicitly setting 
the integrand zero in Eq. ( fT9] ), where the linear approximation 
breaks down, and this process helps suppress the contributions 
from the higher-order terms in (pi because the higher-order 
terms are related to multiple integrals over the modes that are 
suppressed most. Precisely for this reason could the mod- 
ified quadratic estimators be more robust than the standard 
quadratic estimators even when the lensing effect is large. 

However, the modified quadratic estimator requires a rather 
arbitrary choice of the cutoff scale ^cut, which depends on the 
lensing effect, though it may be possible to calibrate against 
simulations ifTill . Furthermore, the removal of the lensing sig- 
nals at I > Icut inevitably results in lower signal-to-noise ra- 
tio, making the reconstruction noisier. We discuss this issue 
with numerical simulations in Sec. lIVB] 



eralize this approach to clusters with arbitrary mass distribu- 
tions. 

Consider a model with 9"^ and its deflection field d™ (n) = 
~9'^n. We construct a delensed temperature field T(n) by 
delensing the observed T{n) with d™(n), and T(n) is related 
to the intrinsic temperature field T(n) as 



T{n) = T(n-d™) 

= T(fi-d" + d) 



(27) 



r[(l + A)n] 



with A = — 6'e. Now we can write the Ukehhood 
in terms of the delensed temperature field T{h) 



£{f\9^) = ^fi9^) fi9'i) + ilndetC, (28) 



where we emphasized the dependence of r(n) on 9"^, and C 
is the covariance matrix of T(n). Taking a derivative of C 
with respect to 9^ gives 



dC 



dT 
89^ 



_i df 
89^ 



(29) 



dH, [(fh Ti.Ti, nihCi, + hCi,) ■ (li + h) 



2n) 



Ill 



C. Maximum Likelihood Estimator 

Given the Gaussian probability distribution of the CMB, 
the likelihood retains all the information of the observed data. 
Even when there exists no optimal estimator, one can always 
find an estimator, if not analytically, that maximizes the like- 
lihood: the maximum likelihood estimator 9^^ is the solution 
of 



dC 



d9li 



(26) 



However, this equation is highly non-linear in general and re- 
quires approximations to be solved even numerically. Equa- 
tions (l25b and ( |26] | show that an optimal estimator is always 
the maximum likelihood estimator. However, note that while 
the converse is not true in general, the maximum likelihood 
estimator asymptotically approaches to the optimal condition. 

Having understood that the quadratic estimator becomes an 
optimal (and maximum likelihood) estimator in the limit of 
no lensing in Sec. IIII Bl we present an alternative approach 
to modeling the likelihood and derive a new maximum likeli- 
hood estimator for singular isothermal clusters. We then gen- 



The second equality is obtained by evaluating the derivative 
at A = 0. Assuming that our initial model with 9^ is a good 
approximation to the true model with 6'e (A* = 9^ — 9^, — 0), 
the likelihood can be expanded around A^ 



C = 



dC 



391 



1 / 52/: 



2 V^^'e^ 



(A - A.) 
(A- AO' + 0(A3), 



(30) 



and we can use the standard Newton-Raphson method 
to solve Eq. ( |26] l and obtain a maximum likelihood estimator 



qML 



A. 



gML 



(31) 



dC 

09^ 



i7n2 



It is important to note that the validity of our solution 
for 9^^ is independent of the linear approximation, but the 
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convergence of depends on the goodness of 6^ to ^e- 
Eq. ( [3T1 ) still involves computationally intensive evaluations 
of the second derivative, or the curvature matrix. We further 
simplify by replacing the curvature matrix with its 

ensemble average, Fisher matrix 



2^2 



(27r)2 

llC/i + 12^/2 



(32) 



(li + I2) 



111 



and by evaluating the derivatives at A^, 
our new maximum likelihood estimator is 



0. Finally, 



qML 



1 



(33) 



(27r)2 J (27r)2 
Ci.Ci, II1 + I2P 



This equation is readily recognizable as the standard 
quadratic estimator in Eq. ( fT9b . except Ci and Ti replaced 
with Ci and T\. The resemblance should not be surprising, 
and in hindsight one could have expected this outcome given 
the result in Sec. HUB I the quadratic estimator becomes 
optimal when the lensing effect is vanishingly small; as we 
delens T(n) well enough that T{n) is close to r(n), the 
residual lensing effect in T{h) is substantially reduced and 
therefore the maximum likelihood estimator takes the form of 
the quadratic estimator, returning diminishing change of the 
second term in Eq. i.e., 9^^ ~ ~ 9e- 

We want to emphasize that this new estimator in the form 
of quadratic estimators is derived by iteratively solving for 
the maximum likelihood in Eq. ( |26] | and updating the initial 
model 9^ as in the standard Newton-Raphson method, i.e., 
it is a maximum likelihood estimator and is independent of 
the linear approximation, to which the validity of the stan- 
dard quadratic estimator is limited. One may be concerned 
about replacing the curvature matrix with the Fisher matrix in 
Eq. ( [33] l and obtaining a solution quadratic in T\ instead of a 
solution rational in T\ (quadratic in T\ both in numerator and 
in denominator). However, both procedures guarantee that the 
correct solution of Eq. ( |26l ) is iteratively found reaching the 
same peak of the likelihood, while the error estimation of pa- 
rameters is approximated by using the Fisher matrix, rather 
than the full curvature matrix. In Sec.|IV]we demonstrate that 
this is a good approximation and the initial model converges 
quickly to the true model. Given the nomenclature of the ex- 
isting quadratic estimators, now let us call our new maximum 
likelihood estimator an improved quadratic estimator^ 



In practice we can use the standard quadratic estimators to 
obtain an initial model and then proceed with our improved 
quadratic estimator to refine the solution, even when the lens- 
ing effect is large. In general, the reconstruction of cluster 
mass profiles is too noisy to provide a good initial model. 
However, we can adopt an initial model for clusters from other 
observations (e.g., galaxy weak lensing and X-ray measure- 
ment) or theoretical expectations (e.g., Navarro-Frenk- White 
(NEW) profiles [2^]). As opposed to the modified quadratic 
estimators, there is no arbitrary choice of /cut in our method. 

The toy model developed here can be readily generalized 
and our improved quadratic estimator can be used to recon- 
struct mass profiles of realistic clusters and large-scale struc- 
ture. However, in the presence of the telescope beam and de- 
tector noise, the delensing process becomes non-optimal be- 
cause it does not commute with the beam smoothing. In the 
absence of detector noise, one can deconvolve the beam fac- 
tor, delens the temperature field, and convolve the beam again, 
which can solve the problem of non-commutativity. 

However, in the presence of detector noise, the beam de- 
convolved noise can produce unwanted power on all scales 
when it is delensed due to the non-white power below the 
beam scale. One can in principle filter out or remove these 
small scales before delensing to mitigate the problem 13, 
which however introduces additional ad hoc scale to the prob- 
lem. The impact of telescope beam and detector noise is small 
in practice for surveys like SPT (At — 6/iK-arcmin) and 
ACT (At — 10/iK-arcmin) as we numerically demonstrate in 
Sec. |IV] We explicitly show in AppendixlAlthat the delensing 
process suppresses the beam effect by a factor of the average 
magnification by clusters, since it corresponds to a mapping 
from the image plane to the source plane. Non-white instru- 
mental noise and boundary effect of detectors may affect the 
delensing process. However, compared to the survey area, the 
lensing signals are limited to a relatively small region around 
clusters where none of those effect is expected to be signifi- 
cant. 



IV. RECONSTRUCTING CLUSTER MASS PROFILES 

Here we use numerical simulations of the CMB and clus- 
ter lensing potential to demonstrate the applicability of our 
improved quadratic estimator to CMB experiments. First, 
we adopt a more realistic model for massive clusters and in- 
vestigate the dependence of our improved quadratic estima- 
tor on assumed initial models in Sec. IIV Al Then we recon- 
struct cluster mass profiles using the standard, modified, and 
improved quadratic estimators, and we compare their perfor- 
mance in Sec. IIV Bl Finally, we discuss the effects of contami- 
nants and investigate the robustness of our improved quadratic 



■ However, note that since our new estimator takes the result of the previous 



iteration as an initial model, another iteration makes use of T(n) that is 
constructed by using the initial model and this initial model is also a func- 
tion of T(n) in the previous iteration, which makes the estimator a rational 
function of temperature, instead of a quadratic function. Therefore, it is 
technically incon'ect to call it a quadratic estimator. 
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FIG. 2: (color online) Reconstructed convergence fields of a 30' x 30' 
region around a cluster at = 1 from an ideal experiment with 
At = 0. Cluster mass is set M = 5 x 10"/i"^Mo. Improved 
quadratic estimators are applied once with initial mass models of 
Minit = 5 X lO"/i"M/0 (left) and Mi„it = 1 x lO^/i^^Af© 
(right) to a single patch of sky. The bottom panels show the resid- 
ual after the true cluster convergence field is subtracted from the top 
panels. 



estimators in the presence of the Sunyaev-Zel'dovich (SZ) ef- 
fects. 



A. Improved Quadratic Estimator 

A singular isothermal model used in Sec. |III] is useful in 
developing an analytic solution of the likelihood approach. 
However, it is rather an academic model than a realistic model 
for massive clusters. Recent numerical simulations show that 
there exist a universal mass profile for dark matter halos, NFW 
profiles IH] 



p{r) 



r/rs{l + r/rsY 



(34) 



The scale radius is described by the concentration parame- 
ter c = Rvir/rg and the normalization coefficient ps is related 
to the mass of clusters M ~ 4TTr^ps[ln{l + c) — c/(l + c)]. 
We now use NFW profiles to model massive clusters. 

The convergence field K(n) of NFW profiles can be ob- 
tained by the ratio of the projected mass density I](r) to the 
critical surface density Scrit of the lensing cluster at zl, 



r 



S(r) 



-'crit 



^pf^)(l + zif, (35) 



where the functional form P{x) of the projected density is 
MM 



P{x) 



1 



a;2 - 1 
1 



tanh 



l-x 

1 + x 



(2^ = 1) 



1 - 



x^ - 1 



tan 



x-1 
x + l 



ix<l) 
(36) 

(a;> 1) 



and the critical sur face density = AttGDl[D^ - 

Z?i)/Z?*(l + zl) is only a function of zl given the precise 
measurement of D*. Note that the convergence field k 
of NFW profiles depend only on the angular separation 
— |n| due to spherical symmetry. The redshift de- 
pendence in Eq. ( |35] | arises due to our use of comoving 
coordinates, reflecting higher densities of the universe 
at zl > 0. For reference, Dl = and 
2400/i-iMpc, and Sent = 2.8 x lO^hM^pc^^ and 
1.8 X for = 0.3 and 1, respectively. For 

clusters of M = 5 x lO^/i-iM© and 1 x IO^^H-^Mq, 
i?vir = and 1.2/i~^Mpc appear subtended by 

3.'0 and 4.'9 on the skyat zl = 1 and 0.3. 

We use CMBFAST 12911 to generate CMB temperature maps 
of 200' X 200' (1000 X 1000 pixels) and set the pixel scale 
0.'2 smaller than detector beam sizes. Given a cluster mass Af 
and redshift z^, we first compute the convergence field K(fi) 
using Eq. ( l35T l. The lensing potential 0(n) and its deflection 
vector d(n) of the cluster are then computed in Fourier space, 
where their relations to K(fi) become a simple multiplication. 
The lensed temperature field r(ri) is computed by displacing 
the intrinsic temperature field T(fi) with d(n) according to 
Eq. Finally, we smooth T{n) with a telescope beam and 
add detector noises to obtain T°^^{n). Standard quadratic es- 
timators can be used to reconstruct a convergence field k{n) 
by using Eqs. ( flj] ), ( fT4b . and ( fTSb with f°^^{n), and so can 
modified quadratic estimators with a choice of ^cut, beyond 
which the integrand in Eq. ( fT3l ) is set zero. 

Similarly, our new estimation process begins with finding a 
solution s to the delensing equation s = n + V(/)™(ri) given 
the lensing potential 0™ (n) of an assumed initial model. We 
then construct a delensed temperature field T{s) — T°^^{n) 
and use the same equations with T°^^{h) replaced by T{s) to 
reconstruct kl- Imposing a consistency condition between the 
assumed model and the estimation result can provide a crite- 
rion for the iteration convergence of our improved quadratic 
estimators. 

ACT and SPT will find ~ 2 x 10'' massive clusters mainly 
by the spectral distortion of the CMB arising from the in- 
verse Compton scattering of hot electrons in clusters, so called 
the SZ effect i30l fsill . with roughly redshift-independent 
threshold mass M > 2 x IO^'^H^^Mq. To test our im- 
proved quadratic estimators, we consider a typical cluster of 
Af = 5 X lO^/i-^Afo and c = 3. Figure |2| shows the re- 
constructed k{h) of a massive cluster at = 1 in an ideal 
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FIG. 3: Dependence of reconstructed mass profiles on an initial mass 
model A/init- Thick and thin solid lines represent the true cluster 
mass profile and the mean of reconstructed mass profiles from 500 
clusters. The mass profiles are obtained by averaging reconstructed 
convergence over the annulus of each cluster. The uncertainties in 
the mean profile are shown as shaded regions. Dashed lines show an 
assumed initial mass model and the cluster virial radius is shown as 
vertical dotted lines. In Panels (c) and (d), the initial mass models 
are taken as the mean mass profile from the previous iteration. The 
reconstruction quickly converges to the true mass profile in two iter- 
ations even with an incorrect choice of A/init = 1 x 10^'^ Mq, 
exhibiting no detectable bias in an ideal experiment. 



experiment with = 0. Here we simply adopt a NFW 
profile with fixed concentration c — 3 for our initial model 
and allow mass Mjnit of the model to vary. Even with fixed 
concentration, changes as a function of A/jnit, and hence 
our assumption allows for changes in the shape as well as the 
scaling of initial mass models. However, note that while we 
use this parametrized model of clusters, our reconstruction is 
general and non-parametric, such that we recover 2-D struc- 
ture of K(n) at each pixel rather than obtain model parame- 
ters AI and c (see ll32i l33ll for reconstructing a parametrized 
cluster model). We assume that the cluster center is known 
from other observations with uncertainty less than our pixel 
scale 0.'2. The upper panels show the reconstructed K(n) from 
our improved quadratic estimator using an initial model of 
Mi^t = 5 X lO^^h-^MQ (left) and 1 x lO^^h'^MQ (right), 
and the bottom panels show the residual after the true K(n) is 
subtracted from the top panels. 

With the perfect initial model in the left panels, the delensed 
temperature field T(n) is identical to the intrinsic r(n), and 
our improved quadratic estimator returns no change on aver- 
age to the initial model (bottom). However, there exist ran- 
dom noises in K(n) over the map, arising from the fluctua- 
tions of the intrinsic temperature gradient, though they are ev- 
idently small and discernible from the massive cluster (top). 
In the right panels, T(n) is delensed with the imperfect initial 
model, so that r(n) is not identical to T(n) but the lensing 



effect is significantly reduced. In this regime, quadratic esti- 
mators become asymptotically optimal and reconstruct ^(n) 
unbiased. The top panel exhibits small anisotropy and some 
residual remains in the bottom panel. In a single patchy of the 
sky, the CMB anisotropy has a gradient direction and gravita- 
tional lensing of the CMB makes no difference orthogonal to 
the gradient direction, in which reconstruction is completely 
degenerate, resulting in the asymmetry in k{h). However, 
since the CMB has no preferred direction, this obstacle can be 
overcome by stacking clusters in different patches of the sky. 
In practice, this stacking process provides the average ^(n) 
of the clusters, or the cluster-mass cross-coiTelation function 
lITill . Hereafter we assume that identical clusters are stacked 
for simplicity. 

We now quantify the ability to reconstruct K(n) with vary- 
ing accuracy of assumed models. Figure |3] plots the recon- 
structed cluster mass profiles from 500 clusters (thin solid). 
The mass profiles are obtained by averaging reconstructed 
k{h) over the annulus of each cluster, and the uncertainties 
in the mean mass profile are shown as shaded regions. Fig- 
ure [3}i shows that our improved quadratic estimator is unbi- 
ased when our assumed model is perfect; it recovers the true 
model (thick solid) with no bias. If an assumed initial model is 
significantly different from the true model in Fig.fS}*, the im- 
proved quadratic estimator suffers from the same problem that 
the standard quadratic estimators have, and the reconstruction 
is again biased low when the residual lensing effect is large. 
However, the reconstructed ^(n) is inconsistent with our as- 
sumed model (dashed), implying that it has not converged to 
the coiTect solution. In Fig.|3]: we take the reconstructed k.{n) 
as a new initial model and apply our improved quadratic es- 
timator to the same clusters. The reconstructed k{h) is now 
close to the true ^(n), but still inconsistent with the assumed 
model. We iterate once more in Fig.|3]i and the reconstructed 
k{h) is identical to the true K(n). One more iteration results 
in no further change and the estimate is consistent with the 
assumed and also the true models, indicating the convergence 
of our estimates. 

Even with the imperfect initial model, the reconstruction 
quickly converges to the true ^(n) and no significant bias de- 
velops even beyond i?vir (dotted). When the reconstructed 
k{n) is inconsistent with the assumed model, one can in prin- 
ciple adopt a different initial model for a faster convergence 
before applying the estimator iteratively. Note that the asym- 
metry seen in Fig. |2] disappears and the reconstructed ^(n) 
restores symmetry, once many clusters are stacked. Further- 
more, the uncertainties in the mean profile decrease as our 
assumed model converges to the true model, because it solely 
results from the intrinsic fluctuations of the CMB in the case 
of perfect delensing. 



B. Performance Comparison 

Before we assess the performance of the three lensing es- 
timators in realistic experiments, we first compare our im- 
proved quadratic estimator to the standard quadratic estima- 
tor, when the lensing effect is small. Figure |4] plots the re- 
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FIG. 4: Mass profile reconstruction for low mass clusters of M — 
1 X lO^'^h"^ Mq at zl ~ 0.3 from standard (sQE) and improved 
(iQE) quadratic estimators (in the same format as in Fig.[3ll. 10,000 
(left) and 1000 (right) clusters are used to obtain the mean profile, 
and the shaded region shows the uncertainties in the mean profile. 
Both estimators recover the true mass profiles within i?vir in the low 
mass regime. Approximately ten times more clusters are needed for 
sQE to achieve the same accuracy than for iQE. However, for com- 
parison we plot the mean profile from 1000 clusters as the dot-dashed 
line in the left panel. 



constructed cluster mass profiles in the same format as Fig. [3] 
For clusters of M = 1 x lO^^h-^MQ at zl = 0.3 (k < 1), 
the improved quadratic estimator recovers the true mass pro- 
file with no detectable bias after two iterations. With signals 
smaller by a factor of five than in Fig. [3] 1000 clusters are 
stacked to obtain the mean mass profile, while 10,000 clus- 
ters are required for the standard quadratic estimator. As 
we quantify the difference in the signal-to-noise ratio be- 
low, the standard quadratic estimator needs approximately 
ten times as many clusters as the improved quadratic esti- 
mator needs to achieve the same accuracy, but we show the 
mean profile (dot-dashed) obtained by applying the standard 
quadratic estimator to 1000 clusters for comparison. Once 
enough clusters are stacked, the standard quadratic estimator 
works well within i?vir, though it shows some hint of devia- 
tion at the core. Thus, the standard quadratic estimator may 
be safely used to reconstruct mass profiles of clusters with 
M < 1 X lO^^'/i^^M© at Zl = 0.3. However, given the 
source of the CMB at — 1090, the lensing effect becomes 
larger as increases, until Ecrit reaches the minimum at 
Zl — 2.5, where Dl becomes a half of D^,. Therefore, the 
standard quadratic estimator cannot be used to reconstruct un- 
biased mass profiles of clusters that are either at zl > 0.3 or 
massive M >lx lO^/i-^M©. Since ACT and SPT will find 
clusters of M > 2 x 10^'^h~^MQ at higher redshift, modified 
or improved quadratic estimators are preferred to the standard 
quadratic estimator. 

Now we consider realistic experiments with dpix = 5/iK 
and compare the performance of the lensing estimators in 
Fig. |5] Since the reconstruction becomes noisier in the pres- 
ence of detector noise and telescope beam, 10,000 clusters are 
stacked for the mean mass profiles when the standard or mod- 
ified quadratic estimator is used, while the improved quadratic 
estimator is iteratively applied to only 1000 clusters. For clus- 
ters of M = 5 X lO^'^/i'^Af© at zl = 1, Fig.|5}i shows that 
the standard quadratic estimators become substantially biased 



in a region around massive clusters, consistent with the pre- 
vious results lfT3i [T4I1 . Quadratic terms in 0i ignored in the 
linear approximation coherently contribute to k\, and hence 
the reconstructed k{h) is biased low where the linear approx- 
imation is violated 1,14.1 . 

Next we consider a modified quadratic estimator in Fig. jS}* 
and adopt ^cut = 1500. The modified quadratic estimator re- 
covers the true mass profile within i?vir but with small devia- 
tion beyond i?vir- The modified quadratic estimators operate 
in the same way of the standard quadratic estimators, except 
signals are removed on small scales (/ > Icut), where the lin- 
ear approximation is violated. However, the choice of /cut is 
rather arbitrary and should be calibrated against simulations: 
lower /cut is needed for more massive clusters. Note that the 
modified quadratic estimator with /cut — > 00 exactly reduces 
to the standard quadratic estimator (in practice /cut ^ 
can achieve this limit because of the Silk damping). In other 
words, a modified quadratic estimator with /cut — iC fails to 
reconstruct the mass profile (born out by Fig.lSji). Moreover, 
we had to adopt /cut = 1500 to reconstruct the mass profile 
in Fig. Is}* and|5]i, a more aggressive choice than /cut = 2000 
proposed in 1 14], with which we cannot recover the mass pro- 
file. This reflects the sensitivity of the modified quadratic es- 
timator to /cut as a function of cluster mass. Larger number 
of clusters are also required to reconstruct the true mean mass 
profile due to the reduction in the signal-to-noise ratio. 

Figure |5]: shows the reconstruction by our improved 
quadratic estimator with Afinit = 1 x 10^^/i~^Mo. The im- 
proved quadratic estimator recovers the true mass profile with 
no significant bias in the presence of detector noise. After 
a few iterations, the estimates quickly converge to the true 
model and the scatter around the mean is greatly reduced com- 
pared to Fig.lSt*. Note that we iteratively applied the improved 
quadratic estimator to the same 1000 clusters. 

In Fig.|5}i and|5^, we consider the effect of telescope beam 
with (?FWHM = 0.'5. Both estimators in Fig.lSji and|5l; re- 
cover the true mass profile unbiased in the presence of detector 
beam, while there exist some deviations in both cases. How- 
ever, note that we explicitly account for the beam effect using 
the formulas developed in Sec. IIIBI rather than deconvolve 
the beam before applying the lensing estimators. The latter 
approach often used in the literature suffers from deconvolved 
detector noise exponentiating on small scales. This problem 
requires a low-pass filtering of reconstructed ^(n), addition- 
ally removing the signals below the beam scale, which results 
in a distortion of its shape of k{h), making it hard to com- 
pare directly to theoretical predictions. However, in reality 
beam convolution suppresses detector noises (of course lens- 
ing signals as well), and it simply makes the reconstruction 
noisy below the beam scale. The dot-dashed line in Fig. |5}i 
contrasts the reconstruction when we explicitly remove ^(n) 
at / > l/(Tb, where ^(n) is obtained by applying the modi- 
fied quadratic estimator with beam-deconvolved data (the line 
is displaced to avoid confusion with other lines). Significant 
shape distortion in k{h) complicates the interpretation. 

For a larger beam size comparable to the scale radius of 
the clusters (^fwhm — 1')^ the reconstruction becomes more 
challenging: modified quadratic estimators cannot recover the 
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FIG. 5: Comparison of reconstructed mass profiles from standard (sQE), modified (mQE), and improved (iQE) quadratic estimators in realistic 
experiments with dpix — SfiK. The reconstruction is more difficult in the presence of detector noise and telescope beam. For the mean 
of reconstructed mass profiles, 10,000 clusters of M = 5 x IO^^-^Mq at zl = 1 are stacked when sQE or mQE is used, while iQE 
is iteratively applied to only 1000 clusters. The shaded regions show the uncertainties in the mean profile. The dot-dashed line (panel d) 
shows the shape distortion in k{h) when mQE is applied after beam-deconvolution, and the line is displaced to avoid confusion (see the text). 
With &FWHM = 1' (panel /), iQE can recover the mean mass profile with small bias below the beam scale. For comparison, we plot the 
reconstructed mass profile (dot-dashed) using mQE in Panel (/). 



cluster mass profile without significant shape distortion (dot- 
dashed). The improved quadratic estimator in Fig.|5]f recov- 
ers the true mass profile beyond i?vir, while it develops small 
bias below the beam scale. 

Figure |6] plots the fractional difference between the lens- 
ing estimates and the true cluster mass profile in Fig.|5] com- 
paring their uncertainty in the mean profile. The difference 
(lines) is computed from the mean mass profiles by stacking 
10,000 clusters for both estimators, while the statistical uncer- 
tainty (gray bands) in the difference is scaled for 500 clusters 
for comparison. The left panel shows that both estimators re- 
cover the cluster mass profile at the 5% level or better in the 
absence of telescope beam, while the modified quadratic esti- 
mator may need fine-tuning of Icut to achieve better accuracy. 
However, the difference in their measurement uncertainty is in 
stark contrast: the improved quadratic estimator has a signifi- 
cantly higher signal-to-noise ratio than the modified quadratic 
estimator. While the reconstruction becomes harder especially 
beyond i?vir in the presence of telescope beam shown in the 
right panel, this trend of signal-to-noise ratio difference per- 
sists. Note that due to the beam smoothing effect the uncer- 
tainty in the estimates at 6* < (?fwhm is reduced while it is 
highly correlated among adjacent bins. 

So far we have numerically demonstrated the performance 
of the lensing estimators in Figs. |5] and |6] standard quadratic 



estimators are significantly biased; modified and improved 
quadratic estimators recover the cluster mass profile with no 
bias, while they show substantial difference in the number of 
clusters that is required to obtain the mean mass profile. To 
quantify this difference, we evaluate Ax^ of each lensing es- 
timator 

Ax' = ^A^(0)Cri(0,0')«(^'), (37) 
e,e' 

where the covariance matrix of k{d) is 

Ck{e, e') = {[k{e) - K{e)] [k{e') - K{e')]) . (38) 

Since ^(fi) is computed from the two Wiener-filtered func- 
tions of the CMB temperature anisotropies, the covariance 
matrix is non-diagonal. The finite width of the convolution 
filter H{h) in Eq. ( fTTl ) also reflects that the lensing estimators 
are a non-local function of the CMB temperature anisotropies, 
and hence non-zero when 0^9'. 

In the absence of telescope beam in Figs.|5j',|5]:;, and|6li, the 
ratio of /S.x^ for the modified quadratic estimator relative to 
the improved quadratic estimator is 8. 1 : a factor of eight larger 
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FIG. 6: Fractional difference between the lensing estimates and the 
true cluster mass profile in Fig.[5] The difference (lines) is computed 
from the mean mass profiles obtained by stacking 10,000 clusters for 
both estimators, while the statistical uncertainty (gray bands) in the 
difference is scaled for 500 clusters. The vertical dotted lines show 
the cluster virial radius. 



number of clusters is required for the modified quadratic esti- 
mator to achieve the same level of accuracy than that for the 
improved quadratic estimator In the presence of telescope 
beam in Figs. |5}i, |5^, and|6f), beam smoothing substantially 
degrades the ability to recover the true cluster mass profile for 
both estimators, and its effect is relatively larger for the mod- 
ified quadratic estimator, increasing the ratio to 10.4. 



C. Sunyaev-Zel'dovich Effects 

On small scales (/ > 2000), the primordial CMB tempera- 
ture anisotropics decay exponentially due to the Silk damping 
drtl and the dominant source of secondary anisotropics is the 
thermal Sunyaev-Zel'dovich (tSZ) effect, arising from scat- 
tering off hot electrons in massive clusters. However, the tSZ 
effect imprints a unique frequency dependence in the CMB 
temperature anisotropics, which in principle can be used to 
remove the tSZ signals. The same Compton scattering pro- 
cess also gives rise to a Doppler effect in the CMB tempera- 
ture anisotropics due to the bulk motion of electron gas, or the 
kinetic Sunyaev-Zel'dovich (kSZ) effect (see ll34l [ssll for re- 
cent reviews). These kSZ signals, albeit smaller than tSZ sig- 
nals, are spectrally indistinguishable from the intrinsic CMB 
temperature anisotropics, introducing an artifact in the lens- 
ing reconstruction. Here we assume that the tSZ signals can 
be cleaned perfectly, and we investigate how the kSZ signals 
deteriorate the lensing reconstruction. 

For simplicity, we assume that the gas density traces the 
dark matter distribution in a massive cluster, with the same 
NFW profile. Given the line-of-sight velocity v\os of the clus- 
ter, the kSZ effect results in temperature anisotropics 

AT(^) = -«ios Tie) TcMB = -ATksz (39) 



where t{8) is the Thompson scattering optical depth, propor- 
tional to the projected density Yi{r = Dl 0). We parametrized 
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FIG. 7: (color online) Cluster lensing and kinetic Sunyaev- 
Zel'dovich (kSZ) effects on the CMB. For comparison, we plot 
6' X 6' regions of CMB temperature maps around a cluster of 
M = 5 X 10" /i"^ A/0 (Svir = 3'.0) at = 1. Upper panels: 
lensed temperature map (left) and its difference from the intrinsic 
temperature map (right). Bottom panels: assuming that the cluster is 
moving toward an observer, the kSZ effect is set AT^sz = 3 (leji) 
and 15/iK (right) at the center. The color scales in each panel repre- 
sent the same temperature except in the upper right panel, where the 
color represents the difference ranging from — to 5/iK. 



the product of v\os and r(0) as ATksz- Note that since the in- 
trinsic CMB and kSZ induced anisotropics dilute in the same 
way as the universe expands, there is no (1 + zl) factor in 
Eq. (O and Tcmb = 2.725K is the CMB temperature today. 

For a typical cluster with electron number density ^ 
0.01 cm^'^ and core radius ^ 100 kpc, the Thompson scat- 
tering optical depth is r(0) = 2 x lO^'' at the core. The rms 
velocity dispersion in linear theory is a.^ = 1.3 x 10~'^(= 
390 kms~^) at zl = 1, and this results in the rms tempera- 
ture fluctuation ATksz = 3.7/iK at the core. We randomly 
draw AT(0) from a Gaussian distribution with zero mean and 
dispersion a — ATkSZ, then we add AT(fi) to r(fi) for ob- 
servations of each cluster 

First, we compare the cluster lensing and kSZ effects on 
the CMB temperature field. Figure [7] plots a 6'x6' regions 
of CMB maps around a cluster of M = 5 x lO^-^h'^MQ 
(6'vii — 3'.0) at zl — 1. The top panels show the lensed tem- 
perature field (left) and the difference from the intrinsic tem- 
perature field (right). Gravitational lensing imprints dipole- 
like wiggles in the CMB map on top of the smooth large-scale 
gradient field. Perpendicular to the gradient direction there ex- 
ists no temperature change and hence lensing reconstruction 
is degenerate along the direction. The bottom panels show the 
kSZ effect with ATksz = S^iK (left) and IbfiK (right). We 
assume that the cluster is moving toward the observer With 
the small optical depth in the left panel, the kSZ effect is rel- 
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FIG. 8: Impact of kinetic Sunyaev-Zel'dovich (kSZ) effects on the 
mass profile reconstruction. Assuming that the gas distribution traces 
the dark matter distribution in clusters, the kSZ effect is computed 
by assigning a Gaussian random velocity to each cluster with rms 
temperature change AT^gz = 3 {left) and 15^K (right) at the center, 
respectively. 



atively small compared to the lensing effect. Larger optical 
depth in the right panel substantially enhances the kSZ effect, 
dominating over the lensing effect at the center. However, 
since the lensing effect is much less concentrated than the kSZ 
effect as the dipole-like wiggles peak at a few scale radii (top 
right), the reconstruction is stiU possible. 

Figure[8]shows the impact of the kSZ effect on reconstruct- 
ing mass profiles. For clusters of i\/ = 5 x IO^'^H^^Mq 
at = 1 in an experiment with ^?fwhm = 1' and tTpix — 
5/iK, we iteratively use improved quadratic estimators with 
-Minit = 1 X 10^^/i~^AfQ. The mean and the uncertainties 
are computed from 1000 clusters. Figure |8]2 shows that the 
kSZ effect with ATksz = 3fiK has relatively little impact on 
the reconstruction: the kSZ effect becomes negligible beyond 
rs because the density profile declines (the gas density 
in reality would be steeper and more confined to the center 
than we assumed here). The lensing effect, on the other hand, 
is sensitive to the deflection field and remains strong beyond 
rs, declining less rapidly than the kSZ effect |29]. In Fig.|8}), 
we consider a larger kSZ effect with ATksz — 15/xK, ex- 
pected either from higher electron number density or from 
higher matter fluctuation normalization erg cx a^. With the 
temperature anisotropics comparable to the lensing effect, the 
reconstruction becomes difficult and it starts to develop bias 
around i?vir as ATksz increases. Note that the bias at the cen- 
ter is largely due to the telescope beam effect. 

In the presence of contaminants such as residual foreground 
or tSZ effect, radio point sources, and large kSZ effect, the 
lensing estimators based on temperature anisotropics need 
to be complemented by using lensing estimators based on 
combination of temperature and E- and B-mode polarization 
ifioll . since there exists no significant source of contamina- 
tion that mimics the intrinsic CMB polarization. Furthermore, 
the unique relation between the E- and B-mode polarization 
signals ||36[ IstIi can be used to provide a robust consistency 
check. However, measurements of the lensed polarization 
fields would require an experiment with higher angular res- 
olution and sensitive detectors than experiments that are cur- 
rently available. 



V. DISCUSSION 

Weak gravitational lensing of the CMB gives rise to a devia- 
tion of the two-point correlation function of the CMB temper- 
ature anisotropics from otherwise statistically isotropic func- 
tion. Quadratic estimators [11] have been widely used to re- 
construct cluster mass profiles and large-scale structure by 
measuring the induced anisotropics in the two-point correla- 
tion function. We have shown that standard quadratic estima- 
tors become optimal in the limit of no lensing, saturating the 
Cramer-Rao bound, while they become progressively biased 
and sub-optimal as the lensing effect increases. Especially for 
clusters that can be found by the ongoing SZ surveys like ACT 
and SPT, the standard quadratic estimators start to be biased 
at zl — 0.3, and at higher redshift, where the lensing effect is 
larger, other estimators should be used to reconstruct cluster 
mass profiles. 

It is recently proposed [14] that this obstacle in the stan- 
dard quadratic estimators can be overcome by explicitly re- 
moving the signals in the CMB temperature gradient field 
at / > /cut, where the lensing effect is large in violation of 
the linear approximation. However, although these modified 
quadratic estimators recover cluster mass profiles with no sig- 
nificant bias, the choice of Icut is somewhat arbitrary and it 
depends on the lensing effect, which requires prior calibra- 
tions against numerical simulations before one can apply the 
modified quadratic estimators to CMB maps. 

We have developed a new maximum likelihood estimator 
for reconstructing cluster mass profiles and large-scale struc- 
ture. We first construct a CMB temperature field by de- 
lensing the observed temperature field based on an assumed 
mass model. We have proved that the delensed temperature 
field is close to the unlensed temperature field with telescope 
beam smoothed and detector noise added, if the assumed mass 
model is a good approximation to the true mass model. The 
delensed temperature field can then be used to set up the like- 
lihood of the CMB, and our new estimator that maximizes this 
likelihood takes a similar form of the standard quadratic esti- 
mators, because it approaches to an optimal estimator as the 
assumed model becomes the true model. Our maximum like- 
lihood estimator can be iteratively applied as we update the as- 
sumed mass model, until it converges (to the true model) and 
the estimate is consistent with the assumed model. Our max- 
imum likelihood estimator, named as an improved quadratic 
estimator, is easy to implement in practice and it has no free 
parameter. 

Our improved quadratic estimators quickly converge to the 
true mass model after a few iterations, even when an as- 
sumed initial model is significantly different from the true 
model. When the estimate is inconsistent with the assumed 
model, one can adopt another initial model for iterations for 
faster convergence of the improved quadratic estimators. The 
telescope beam and detector noise renders the reconstruction 
harder, but we have demonstrated that the improved quadratic 
estimators recover cluster mass profiles with a beam size com- 
parable to the cluster scale radius. Furthermore, our new esti- 
mator significantly improves the signal-to-noise ratio over the 
standard or modified quadratic estimators by a factor of ten in 
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number of clusters, because when an assumed model is close 
to the true mass model, the only source of noise for our es- 
timator is the intrinsic fluctuations of the CMB temperature 
gradient. 

We have tested the robustness of the improved quadratic es- 
timators in the presence of the kSZ effect. The kSZ distortion 
^Tksz < 15^K at the center results in relatively small bias 
in the reconstructed cluster mass profiles. However, since the 
optical depth is a function of electron number density in the 
clusters, it is related to the true mass profile. Therefore, we 
could take a more aggressive approach to modeling kSZ sig- 
nals from an assumed initial mass model and subtract the kSZ 
contributions before applying improved quadratic estimators. 
Furthermore, this template for kSZ signals can also be itera- 
tively refined as we update our assumed mass model. 

Since the reconstruction is non-parametric, it is not limited 
to spherical clusters, while stacking many clusters ensures that 
irregular shapes of individual clusters become irrelevant. Sim- 
ilar arguments can be applied to projection effects: each clus- 
ter can be located at a line-of-sight with overdense or under- 
dense regions, but projection effects become negligible once 
many lines-of-sight are combined. Given a sample of clusters 
from SZ surveys, the average mass profile of stacked clus- 
ters would provide a cluster-mass cross-correlation function, 
which can be used to measure the growth rate of structure, 
probing the evolution of dark energy, instead of individual 
cluster mass profiles. 

However, in reality it would be harder to reconstruct clus- 
ter mass profiles than considered here, because there exist 
other contaminants such as point radio sources and residual 
foreground and/or tSZ effect, and other complications such 
as non-isolated clusters and internal bulk motion of gas in 
clusters. However, additional information from polarization 
measurements may be used to overcome some of the diffi- 
culties, given the unique relation between the E- and B-mode 
polarization signals and relatively negligible primary and sec- 
ondary contaminants. Finally we mention that our improved 
quadratic estimators can be applied to reconstruct large-scale 
structure, while in this regime standard quadratic estimators 
can be used without significant bias. 
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lensing potential (j)"^{n) of an assumed mass model, the lens- 
ing equation relates an image position n to a source position 
s"'' — h + V(/)™(n). Here we keep the superscript rn to indi- 
cate the relation to the assumed model. The true source posi- 
tion is then s = n + V(/)(n), where 0(n) is the true lensing 
potential. Now we construct a delensed temperature field 



(Al) 



= / d^m B{m - n) r(m) + r^(n) 



where B{m) is the telescope beam function. Since the 
lensing equation is not analytically invertible in general, we 
keep both s™ and n, but note that they are not independent 
variables. In Fourier space, the delensed temperature field is 
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with a contribution from the CMB 
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and a contribution from the detector noise 
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The lensed temperature is T(n) — T(i) and its Fourier 
mode is 
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With the linear approximation, one can expand the exponen- 
tial term to the first order in and this equation reduces to 
Eq. (O. However, we keep the equation as general as possible 
to be valid, even when the lensing effect is large. Substituting 
Tij in Eq. iA3i and changing the integration variable n to s™ 
gives 



APPENDIX A: DELENSED TEMPERATURE FIELD 



Here we derive a relation Tl ~ Ti e 2' '^6 -| 
presence of telescope beam and detector noise. 
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FIG. 9; Effects of telescope beam and detector noise on the delensing 
process. The top panel compares Ti (thin) with Ti e~2' -\- 
(thick) in terms of their power spectrum, and the bottom panel shows 
the fractional deviations. The vertical dotted line represents the beam 
scale I = l/(7(,. CMB experiments with ^fwhm = 1' and (Tpix = 
5/xK are considered for clusters of i\f = 5 x IO^'^/i^^Mq at zl ~ 1. 
The noise only case is largely obscured by the solid line. 



Given the lensing potential <j>{n) (analogously for (f)™{h)), 
the Jacobian is related to the distortion matrix 



dn^ 
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1 + VV0 
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and its inverse is the lensing magnification. 

For a Gaussian beam B\ — exp [^^'^Cb] , we can integrate 
over the beam factor 



Ti' = I d% Ti, 



^„ 2^1 i(l2-n2-l-n) i[l2-V0(n2)-l-V0™(n) 
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Now we parametrize 112 by a dimensionless displace- 
ment vector A centered at n (i.e., n2 = n + (TbA). The 
Gaussian beam factor guarantees that the integrand is non- 
vanishing only when A = | A | is small. In order to get more 
intuition, we expand (j>{n2) — 0(n) + W(f>{h) ■ (TbA to the 
linear order in A, and integrating over A gives 
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This is the final expression for the delensed temperature 
field. The first exponential term of the integrand controls the 
delensing process: when the assumed model is close to the 
true model after a few iterations (0™(n) ~ (/'(n), s™ ~ s), 
the integral becomes a Dirac delta function and — T\, 
when the beam smoothing is negligible. The distortion 
matrix is close to the identity matrix beyond i?vir and 

^ e 1 i2 2 

T{ ~ Ti e---' Around massive clusters, the distortion 
matrix deviates from the identity matrix and its determinant 
becomes smaller than one, making the exponential factor 
unity. This reflects that the beam size is reduced by a mapping 
from the image plane to the source plane, and practically 
ff ~ Ti e'i^'^b with CTh < ab. 

For a white detector noise, the delensed detector noise is 
simply the redistributed white noise. However, since the de- 
lensing process alters the unit area on the sky, it becomes non- 
white but its deviation is confined to relatively small region; 
the noise power spectrum is 



d% I d% (T,f *) 
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It is the Jacobian of the distortion matrix that makes 
white noise non-white in a region around massive clusters. 
Outside i?vir, where the Jacobian is near unity, the integral 
becomes a Dirac delta function and the noise is again white. 

Figure |9] compares our delensing (TI: thin) and perfect de- 
lensing (T\ e~2^ "b : thick) processes in terms of their 
power spectrum. In the absence of detector noise (dashed), 
Tj starts to deviate from T\ e 2 ' "^i. around the beam scale 
I ~ 1 /db (vertical dotted), declining less rapidly. On scales 
below the beam scale, our approximation (A <C 1) breaks 
down and M^^(n) differs from the identity matrix, leading 
to the excess power However, at this scale, signals are dom- 
inated by the detector noise (solid). Since detector noises are 
unaffected by the beam distortion when delensed, the devia- 
tion of from is relatively mild and it is solely due to 
the (inverse) magnification effect of the mapping from the im- 
age plane to the source plane. The noise only case (dotted) 
is largely obscured by the solid line. In summary, telescope 
beam and detector noise has little impact on our delensing 
process at scales larger than the beam scale, where most of 
the information is contained. 
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